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Abstract 

Given a finite- valued sample Xi , . . . , X„ we wish to test whether it was 
generated by a stationary ergodic process belonging to a family Ho, or it 
was generated by a stationary ergodic process outside Hq. We require the 
Type I error of the test to be uniformly bounded, while the type II error 
has to be mande not more than a finite number of times with probability 
1. For this notion of consistency we provide necessary and sufficient condi- 
tions on the family Hq for the existence of a consistent test. This criterion 
is illustrated with applications to testing for a membership to paramet- 
ric families, generalizing some existing results. In addition, we analyze 
a stronger notion of consistency, which requires finite-sample guarantees 
on error of both types, and provide some necessary and some sufficient 
conditions for the existence of a consistent test. We emphasize that no 
assumption on the process distributions are made beyond stationarity and 
ergodicity. 

Keywords: Hypothesis testing, stationary processes, ergodic processes, distributional distance. 

1 Introduction 

Given a sample Xi , . . . , Xn (where Xi are from a finite alphabet A) that is 
known to be generated by a stationary ergodic process, we wish to decide 
whether it was generated by a distribution belonging to a certain family Hq, 
versus it was generated by a stationary ergodic distribution that does not belong 
to Hq. Unlike most of the works on the subject, we do not assume that Xi are 
i.i.d., but only make a much weaker assumption that the distribution generating 
the sample is stationary ergodic. 

A test is a function that takes a sample and an additional parameter a (the 
significance level), and gives a binary (possibly incorrect) answer: the sample 
was generated by a distribution from Hq or by a stationary ergodic distribution 
not belonging to Hq. Here we are concerned with characterizing those families 
Hq for which consistent tests exist. 
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We consider the following notion of consistency. Call a test consistent if, for 
any pre-specified level a G (0, 1), any sample size n and any distribution in Hq 
the probability of Type I error (the test says "not Hq") is not greater than a, 
while for every stationary ergodic distribution from outside Hq and every a Type 
II error ( the test says Hq ) is made only a finite number of times ( as the sample 
size goes to infinity) with probability 1. This notion of consistency represents a 
classical statistical approach to the problem, and suites well situations where 
the hypothesis Hq is considerably more simple than the alternative, for example 
when Hq consists of just one distribution, or when it is some parametric family, 
or when it is the hypothesis of homogeneity or that of independence. 
Prior work. There is a vast body of literature on hypothesis testing for i.i.d. 
(real- or discrete- valued) data (see e.g. |8]). In the context of discrete- valued 
i.i.d. data, the necessary and sufficient conditions for the existence of a consis- 
tent test are rather simple to obtain: there is a consistent test for Hq (against 
"i.i.d. but not Hq" ) if and only if Hq is closed, where the topology is that of the 
parameter space (probabilities of each symbol), e.g. see [1]. The consistency 
being easy to ensure, the prime concern for the case of i.i.d. data is optimality. 

There is, however, much less literature on hypothesis testing beyond i.i.d. or 
parametric models, while the questions of determining whether a consistent test 
exists (for different notions of consistency and different hypotheses) is much 
less trivial. For a weaker notion of consistency, namely, requiring that the 
test should stabilize on the correct answer for a.e. realization of the process 
(under either i7o or the alternative), [7] constructs a consistent test for so- 
called constrained finite-state model classes (including finite-state Markov and 
hidden Markov processes), against the general alternative of stationary ergodic 
processes. For the same notion of consistency, [10] gives sufficient conditions on 
two families and Hi that consist of stationary ergodic real- valued processes, 
under which a consistent continuous test exists, extending the results of [5] for 
i.i.d. data. The latter condition is that Hq and Hi are contained in disjoint 
Fg. sets (countable unions of closed sets) , with respect to the topology of weak 
convergence. For the notion of consistency that we consider, consistent tests for 
some specific hypotheses, but under the general alternative of stationary ergodic 
processes, have been proposed in [HIIHIIII], which address problems of testing 
identity, independence, estimating the order of a Markov process, and also the 
change point problem. Some impossibility results for testing hypotheses about 
stationary ergodic processes can be found in [9l [13] . 

The results. The aim of this work is to provide topological characterizations 
of the hypotheses for which consistent tests exist, for the case of stationary 
ergodic distributions. The obtained characterization is rather similar to those 
mentioned above for the case of i.i.d. data, but is with respect to the topology 
of distributional distance (or weak convergence). The fact that necessary and 
sufficient conditions are obtained indicates that this topology is the right one 
to consider. 

A distributional distance between two process distributions is defined as a 
weighted sum of probabilities of all possible tuples X £ A* , where A is the 
alphabet and the weights are positive and have a finite sum. The main result 
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is the following theorem (formalized in the next sections). 

Theorem. There exists a consistent test for Hq if and only if Hq has probability 
1 with respect to ergodic decomposition of every distribution from the closure 
of iJo- 

The test that we construct to establish this result is based on empirical 
estimates of distributional distance. For a given level a, it takes the largest e- 
neighbourhood of the closure of Hq that has probability not greater than \ — a 
with respect to every ergodic process in it, and outputs if the sample falls into 
this neighbourhood, and 1 otherwise. 

To illustrate the applicability of the main result, we show that the families of 
fc-order Markov processes and fc-state Hidden Markov processes (for any natural 
fc), satisfy the conditions of the theorem, and therefore there exists a consistent 
test for membership to these families. 

It should be emphasized that the results of this work concern what is possible 
in principle; finding an efficient testing procedure for each specific hypothesis for 
which we can demonstrate existence of a consistent test is a different problem. 



2 Preliminaries 

Let A be a finite alphabet, and denote A* the set of words (or tuples) U^^^* 
and A°° the set of all one-way infinite sequences. For a word B & A* the 
symbol \B\ stands for the length of B. Distributions, or (stochastic) processes, 
are measures on the space {A°° ,Ta'^)i where .Fa~ is the Borel sigma-algebra 
of A^ . Denote #(X, B) the number of occurrences of a word B € A* \n a. word 
X (E A* and ^{X, B) its frequency: 

\X\-\B\ + 1 

#{X,B)= ^{(x.,...,x.+ |b|_i)=b}, 

and 

.(XB) = l ^FTW#(^'^) if 1^1' (1) 

^ ' ^ \ otherwise, ^ ' 

where X = {Xi, . . .,X\x\)- For example, ^(0001,00) = 2/3. 

We use the abbreviation Xi .fe for Xi, . . . ,Xk. A process p is stationary if 

p(Xi..|B| =S) =p(Xt..t+|B|_i = B) 

for any B ^ A* and t G N. Denote S the set of all stationary processes on A°°. A 
stationary process p is called (stationary) ergodic if the frequency of occurrence 
of each word i? in a sequence Xi, X2, ■ ■ ■ generated by p tends to its a priori 
(or limiting) probability a.s.: p(fim„_>oo i^(Ai..„, S) = p(Ai .|B| = B)) = 1. 
By virtue of the ergodic theorem (e.g. [3 ), this definition can be shown to 
be equivalent to the standard definition of stationary ergodic processes (every 
shift-invariant set has measure or 1; see e.g. Q). Denote £ the set of all 
stationary ergodic processes. 
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Definition 1 (distributional distance). The distributional distance is defined 
for a pair of processes pi , p2 as follows JBj: 

oo 

d{pi,P2) = ^ Wfc|pi(Xi..|B^| = Bk) - p2(^l..|Sfc| = Bk)\, 
k=l 

where Wk — and Bk, /c G N range through the set A* of all words in 

length-lexicographical order (the weights and ordering are fixed for the sake of 
concreteness only). 

It is easy to see tliat d is a metric. Equipped witli this metric, the space 
of all stochastic processes is a compact, and the set of stationary processes S 
is its convex closed subset. (The set £ is not closed.) We refer to for more 
information on the metric d, including the proofs of the mentioned facts. When 
talking about closed and open subsets of S we assume the topology of d. For 
H C S, denote c\H the closure of H. 

Compactness of the set S is one of the main ingredients of the analysis in 
this work. Another is that the distance d can be consistently estimated, as is 
demonstrated in Lemma [T] of section [6] below (see also jl4)). 

Considering the Borel (with respect to the metric d) sigma-algebra on the 
set S, we obtain a standard probability space (SjJ-'s)- An important tool that 
will be used in the analysis is ergodic decomposition of stationary processes 
(see e.g. [6l|3]): which we recall here. Any stationary process can be expressed 
as a mixture of stationary ergodic processes; more formally, for any p G S there 
is a measure Wp on {S,!Fs), such that Wp{S) = 1, and p{B) — J dWp{p)p{B), 
for any B e J'a°=' ■ The support of a stationary distribution p is the minimal 
closed set U d S such that Wp(U) = 1. 

A test is a function : A* ^ {^A} that takes as input a sample and 
a parameter a G (0,1), and outputs a binary answer, where the answer is 
interpreted as "the sample was generated by a distribution that belongs to 
i?o" , and the answer 1 as "the sample was generated by a stationary ergodic 
distribution that does not belong to Hq." A test makes the Type / error if it 
says 1 while Hq is true, and it makes Type //error if it says while Hq is false. 

Definition 2 (consistency). Call a test ^°',a G (0, 1) consistent as a test of Hq 
against Hi if: 

(i) The probability of Type I error is always bounded by a: p{X G A" : 
ijj°'{X) = 1} < a for every p G Hq, every n G N and every a G (0, 1), and 

(a) Type II error is made not more than a finite number of times with prob- 
ability 1: p(lim„^oo ■!/)"(Xi..„) = 1) = 1 for every p € Hi and every 
a G (0,1). 
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3 Main results 



The test constructed below is based on empirical estimates of the distributional 
distance d: 

oo 

d{Xi,,n,p) = ^Wi\l'{Xi,,n,Bi) ~ p(-Bj)|, 
1=1 

where n £ N, p E S, Xi..„ e A"-. That is, d(Xi..„,p) measures the discrep- 
ancy between empirically estimated and theoretical probabilities. For a sample 
Xi,,n E A" and a hypothesis H C £ define 

d(Xi..„,i/) - inf d(Xi..„,p). 

Construct the test ipfj^^,a G (0,1) as follows. For each n e N, (5 > and 
H C £ define the neighbourhood h'^{H) of n-tuples around H as 

U^{H) {X e A" : d{X,H) < d}. 

Moreover, let 

7„(i/,e) :=inf{<5: inf p(6?(H)) > 0} 

pS-H 

be the smallest radius of a neighbourhood around H that has probability not 
less than 9 with respect to every process in H, and let C'^{H,9) 6" (h e)i^) 
be a neighbourhood of this radius. Define 



if Xi..„ e C"(cliJon£:, l-a), 

1 otherwise. 



We will often omit the subscript Hq from ip'^ when it can cause no confusion. 

The main result of this work is the following theorem, whose proof is given 
in section [6l 

Theorem 1. Let Hq C £. The following statements are equivalent: 
(i) There exists a consistent test for Hq against £\Ho. 
(a) The test ipfj^ is consistent. 

(Hi) The set Hq has probability 1 with respect to ergodic decomposition of every 
p in the closure of Hq: Wp(-ffo) = 1 for each p G clH^. 



4 Examples 

The first simple illustration of Theorem [T] above is identity testing, or goodness 
of fit: testing whether a distribution generating the sample obeys a certain given 
law, versus it does not. Let p E £, Hq — {p}. Since Hq is closed. Theorem [T] 
implies that there is a consistent test for Hq. Identity testing is a classical 
problem of mathematical statistics, with solutions (e.g. based on Pearson's 



5 



statistic) for i.i.d. data (e.g. [S]), and Markov chains [5]. For stationary ergodic 
processes, [12] gives a consistent test when Hq has a finite and bounded memory, 
and [13] for the general case. 

Another example is bounding the order of a Markov or a Hidden Markov 
process. Theorem |T] implies that for any given k € N there is a consistent test 
of the hypothesis J^'^— "the process is Markov of order not greater than 
(against £\M''). Moreover, there is a consistent test of HA^'^="the process is 
given by a Hidden Markov process with not more than k states." Indeed, in 
both cases (fc-order Markov, Hidden Markov with not more than k states) , the 
hypothesis Hq is a parametric family, with a compact set of parameters, and a 
continuous function mapping parameters to processes (that is, to the space <S). 
Weierstrass theorem then implies that the image of such a compact parameter 
set is closed (and compact). Moreover, in both cases Hq is closed under taking 
ergodic decompositions. Thus, by Theorem [U there exists a consistent test. 

The problem of estimating the order of a (hidden) Markov process, based 
on a sample from it, was addressed in a number of works. In the contest of 
hypothesis testing, consistent tests for Ai'^ against A4* with t > k were given in 
[1] , see also [2] . For a weaker notion of consistency (the test has to stabilize on 
the correct answer eventually, with probability 1) the existence of a consistent 
test for TLM^ was established in [7]. For the notion of consistency considered 
here, a consistent test for M'' was proposed in [TT], while for the case of testing 
HM'^ the result above is apparently new. 

5 Uniform testing 

Finally, let us consider a stronger notion of hypothesis testing, that requires 
uniform speed of convergence for errors of either type. 

A test (p is called uniformly consistent if for every a there is an Uq G N 
such that for every n > Ua the probability of error on a sample of size n is less 
than a: p(X € A" : (p{X) = i) < a for every p € and every i € {0, 1}. 

For Ho, Hi C iS, the uniform test (pHo,Hi is constructed as follows. For 
each 71 e N let 



Theorem 2 (uniform testing). Let Hq C S and Hi C S. If Wp{Hi) = 1 for 
every p E cl Hi then the test ipHo,Hi is uniformly consistent. Conversely, if there 
exists a uniformly consistent test for Hq against Hi then Wp{Hi^i) — for any 
p e clHi. 

The proof is given in the next section. 




if d(Xi..„,clHon£) < d{Xi„n,c\Hin£), 



(2) 



1 otherwise. 



6 Proofs 



The proof of the main results will use the following lemmas. 



6 



Lemma 1 {d is consistent). Let p,^ G £ and let a sample Xi,,k) be generated 
by p. Then 

lim d{Xi„k,0 =d{p,0 P-a-s. 

k — *oo 

The proof is based on the fact that the frequency of each word converges to 
its expectation. For each 5 we can find a time by which the first K{5) frequencies 
will have converged up to where K{5) is such that the cumulative weight of 

the rest of the frequencies is smaller than 5 too. 

Proof. For any £ > find such an index J that Yll^Lj ^''j < For each j we 
havelimfc^oo ^{X\,.k, Bj) = p{Bj) a.s., so that \v{Xi,,k,Bj)—p{Bj)\ < e/ {2Jwj) 
from some k on; denote Kj this k. Let K = maxj<j Kj {K depends on the 
realization Xi,X2, . ■ .). Thus, for k > K we have 



\d{X,..k,^)-d{p,^)\ 



j2MH^i-k,Bi) - aBi)\ - \p{Bi) - 

i=l 

oo J 

< J2^MXi..k,Bi)- p{Bi)\ < J2u'M^i..k,Bi) - px{Bi)\ + e/2 

i=l i=l 

J 

< Wie/(2Jwi) + e/2 = e, 

which proves the statement. □ 

Lemma 2 (smooth probabilities of deviation). Let m>2k>l,p£S,HGS, 
and £ > 0. Then 

p{d{Xi,.m, H)>e)<p (^d{X,„k, H)>e- ^^^^^ ' , (3) 

where tk is the sum of all the weights of tuples longer than k in the definition 
ofd: tk := J2i:\Bi\>kWi, and 

/ - m 2k \ 

H)<e)<p (diX...k, H) < --^£ + --^ j . (4) 

The meaning of this lemma is as follows. For any word Xi. „j, if it is far 
away from (or close to) a given distribution p, (in the empirical distributional 
distance), then some of its shorter subwords Xi,,i+k is far from (close to) p too. 
By stationarity, we may assume that i = 1. Therefore, the probability of a (5-ball 
of samples of a given length is close to the probability of a 6-hall of samples of 
smaller size. In other words, for a stationary distribution p, it cannot happen 
that a small sample is likely to be close to p, but a larger sample is likely to be 
far. 
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Proof. Let B he a tuple such that \B\ < k and Xi..„j G A"^ be any sample 
of size m > 1. The number of occurrences of i? in X can be bounded by the 
number of occurrences of B in subwords of X of length k as follows: 

/c - LB + 1 ^ 

m — k-\-l 



J2 v{X,..,+k-i,B) + 2k. 



1=1 



Indeed, summing over i — l..m — k the number of occurrences of B in all 
^i..i+fc-i we count each occurrence of B exactly k — \B\ + 1 times, except for 
those that occur in the first and last k symbols. Dividing by m — \B\ + 1, and 
using the definition ([T]), we obtain 



v{X^..m,B) < _ ^ ^ E K^....+fe-i,S)l +2fcj . (5) 
Summing over all i?, for any fi, we get 

d(Xi..™,/i) < E + + ife, (6) 

i—l 

where in the right-hand side tk corresponds to all the summands in the left-hand 
side for which \B\ > k, where for the rest of the summands we used \B\ < k. 
Since this holds for any /i, we conclude that 



m-k+l 



d{Xi..^,H) < -— ^ d{X,..,+k-i,H) 



2k 

tk- 



m — fc + 1 



Therefore, for any Xi .™ e A"\ if d{Xi, „nH) > e then there is an index 

2 k 

m-k+1 



i <m — k such that d{Xi,,i^k-ii H) > e r^frr ^ tk- Moreover, we have (by 



the definition of stationarity) 

p{d{X,„,+k-i,H) > s') = p{diXi„k,H) > e') 
where e' — e — — tk- So we have 

p H) >£')>/? (d(Xi..,„, i/) > e) , 

proving ([3]). The second statement can be proven similarly; indeed, analogously 
to dS]) we have 

m-fc+l 2^ 



m — \B\ + 1 ^ 



m-\B\ + l ^ ' ....T-^-., / m-\B\ + l 



^ 1 I m — k + 1 
~ m — k + 1 \ m 



^+1 \ 2k 

y{X,..,+k-i,B) , 

1=1 / 
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where we have used \B\ > 1. Summing over different B, we obtain (similar 
to ©), 

d{Xi,,m, fJ^) > ; — — r dk{Xi..i+n-i, fJ') 

m — k + 1 ^-^ m m 

i—l 

(since the frequencies are non-negative, there is no f„ term here), which, using 
stationarity of p, imphes ([4]). □ 

Lemma 3. Let pk £ S, k E N be a sequence of processes that converges to a 
process . Then, for any T E A* and e > if pk (T) > e for infinitely many 
indices k, then p*(T) > e 

Proof. The statement follows from the fact that p(T) is continuous as a function 
of p. □ 

Proof of Theorem\^ The implication (ii) ^ (*j is obvious. We will show (Hi) 
(ii) and (i) ^ (Hi). To establish the former, we have to show that the 
family of tests t/i" is consistent. By construction, for any p € cl Hq n 5 we have 
p(i^"(Xi..„) = 1) < a. 

To prove the consistency of i/;, it remains to show that f (i/)"(Xi..„) = 0) — > 
a.s. for any ^ S £\Ho and a > 0. To do this, fix any ^ € £\Ho and let 
A :— d(^,cliJo) infpgciifonf ^'(Ci p)- Since c\Hq is closed, we have A > 0. 
Suppose that there exists an a > 0, such that, for infinitely many n, some 
samples from the A/2-neighbourhood of n-samples around ^ are sorted as Hq 
by tp, that is, C"(cli7o n 1 - a) n &a/2(0 ^ ^- Then for these n we have 
7„(cl7Ion£:,l-a) > A/2. 

This means, that there exists an increasing sequence nk,k G N, and a se- 
quence Pk G c\Hq, A; G N, such that 



Pk{bl)^{c\H„n£)) 



< 1 - a. 



Since the set clHo is compact, (as a closed subset of a compact set <S), we 
may assume (passing to a subsequence, if necessary) that pk converges to a 
certain p» e clHo. Using Lemma [2l (|4]), for every m large enough to satisfy 
+ li::^ < V2 we have 



p^{bl)^{dHon£)) 



< 1 



Since this holds for infinitely many to, using Lemma[3](with T — b^^^{cl HoHS)) 
we conclude that 

p,{bl)^{c\Hon£)) <l-a. 
Since the latter inequality holds for infinitely many indices k we also have 

p*(limsupd(Xi..„,cli/on£) > A/4) > 0. 
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However, we must have p*(lim„^oo rfl-'S^L.n, cl i?o n 5) = 0) = 1 for every 
p* G cIHq: indeed, for g dHo n £ it follows from Lemma [l] and for 
p* G cIHq\£ from Lemma [TJ ergodic decomposition and the conditions of the 
theorem {Wp{Ho) = 1 for p e dHo). 

This contradiction shows that for every a there are not more than finitely 
many n for which C"(cl Hq fl 1 — a) fl b^^^iO ^- To finish the proof of the 
of the implication, it remains to note that, as follows from Lemma [1] 

^{Xi,X2, . . . . : Xi,,n G ^A/2(0 from some n on} 

>^(lim d(Xi..„,^) -O) =1. 



To establish the implication (i) ^ (Hi), we assume that there exists a con- 
sistent test If for Hq, and we will show that Wp{£\Ho) = for every p £ cli?o- 
Take p e clffo and suppose that Wp{£\Hq) ~ 5 > 0. We have 

limsup / dWp{n)n{ijf/^ 0) < / \im sup dWp{n) niipi^^ = 0) = 0, 

where the inequality follows from Fatou's lemma (the functions under integral 
are all bounded by 1), and the equality from the consistency of -0. Thus, from 

some n on we will have /^y^^^ dWp^{;i]/J'^ =0) < 1/4 so that p(Va7^ = 0) < 
1 — 3(5/4. For any set T G A" the function /i(T) is continuous as a function of 
jjL. In particular, it holds for the set T := {Xi..„ : ipn^ {Xi,,n) — 0}. Therefore, 
since p G cl Hq, for any n large enough we can find a p' G Hq such that p'{ipn = 
0) < 1 — 3(5/4, which contradicts the consistency of ip. Thus, Wp{Ho) = 1, and 
Theorem [1] is proven. □ 
Proof of Theorem\^ To prove the first statement of the theorem, we will show 
that the test (pHo,Hi is a uniformly consistent test for dHg n£ against dHiDS 
(and hence for Hq against Hi), under the conditions of the theorem. Suppose 
that, on the contrary, for some a > for every n' G N there is a process p G cl Hq 
such that p(ip{Xi,,n) = 1) > a for some n > n' . Define 

A -.^ d{dHo,dHi) := inf d{po,pi), 

Poed Hons, Pied Hin£ 

which is positive since dHo and cli?i are closed and disjoint. We have 

a < p{ip{Xi„„) = 1) 

< pidiXi.,a,Ho) > A/2 or d(Xi..„,ffi) < A/2) 

< p(d(Xi..„, i/o) > A/2) + p(d(Xi..„, Hi) < A/2). (7) 

This implies that either p{d{Xi,,n,d Hq) > A/2) > a/2 or p{d{Xi,,n,dHi) < 
A/2) > a/2, so that, by assumption, at least one of these inequalities holds for 
infinitely many n G N for some sequence p„ G Hq. Suppose that it is the first 
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one, that is, there is an increasing sequence n^, z G N and a sequence p,; £ cliJo, 
« G N such that 

p,{d{Xi„n^,c\Ho) > A/2) > a/2 for all i e N. (8) 

The set S is compact, hence so is its closed subset cl Hq. Therefore, the sequence 
Pi, i e N must contain a subsequence that converges to a certain process € 
cl Hq. Passing to a subsequence if necessary, we may assume that this convergent 
subsequence is the sequence p^, i G N itself. 

Using Lemma[21 ^ (with p — Pn^, Tn — rim, k = nj., and H — cliJo), and 
taking k large enough to have tn^. < A/4, for every m large enough to have 
— 221^ < A/4, we obtain 

Pn^ (d(Xi..„, , cl Ho) > A/4) > p„„, , cl Ho) > A/2) > a/2. (9) 

That is, we have shown that for any large enough index Uk the inequality 
Pnrr,{d{Xi , c\ Ho) > A/4) > a/2 holds for infinitely many indices 7i,„. From 
this and Lemma [3] with T ^ := {X : d(Xi..„^ , cl i/o) > A/4} we con- 
clude that p^,{Tk) > a/2. The latter holds for infinitely many k; that is, 
p*{d{Xi,,nk,c\Ho) > A/4) > a/2 infinitely often. Therefore, 

p,(limsupd(Xi..„,cli?o) > A/4) > 0. 

n — >co 

However, we must have 

p,( lim d{Xi„,„c\Ho)=Q) = 1 

n — ^cxD 

for every p* G ciHo'. indeed, for p* G c\Ho H £ it follows from Lemma [1] and 
for p* G clHo\£ from Lemma [U ergodic decomposition and the conditions of 
the theorem. 

Thus, we have arrived at a contradiction that shows that p„(d(Xi..„, cl Ho) > 
A/2) > a/2 cannot hold for infinitely many n G N for any sequence of p„ G 
cIHq. Analogously, we can show that p„((i(Xi..„, cl iJi) < A/2) > a/2 cannot 
hold for infinitely many n G N for any sequence of p„ G cl_ffo- Indeed, using 
Lemma[2l equation (j4|), we can show that pn^{d{Xi,,n„^,clHi) < A/2) > a/2 
for a large enough Um implies p„^ {d{Xi,,nk , cl Hi) < 3A/4) > a/2 for a smaller 
rife. Therefore, if we assume that p„((i(Xi..„, cl iJi) < A/2) > a/2 for infinitely 
many n G N for some sequence of p„ G c\Hq, then we will also find a p* for 
which p*(d(Xi..„, cl_ffi) < 3A/4) > a/2 for infinitely many n, which, using 
Lemma [T] and ergodic decomposition, can be shown to contradict the fact that 
p*(lim„^oo rf(-'^i..n,cl-ffi) > A) = 1. 

Thus, returning to ([7]), we have shown that from some n on there is no 
p G cliJo for which p{ip = 1) > a holds true. The statement for p G cli/i can 
be proven analogously, thereby finishing the proof of the first statement. 

To prove the second statement of the theorem, we assume that there ex- 
ists a uniformly consistent test ip for Hq against Hi, and we will show that 
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Wp{Hi-i) — for every p E cl Hi. Indeed, let p G cl_ffo, that is, suppose that 
there is a sequence £,i € Ho,i G N such that p. Assume Wp{Hi) = S > 
and take a := S/2. Since the test y is uniformly consistent, there is an A'' e N 
such that for every n > N we have 

p{<p{Xi..n = 0)) < / ^{X,„n = 0)dWp + [ = 0)dWp 

JHi Je\Hi 

<Sa + l-S<l- S/2. 

Recall that, for T G A*, fJ,{T) is a continuous function in fi. In particular, this 
holds for the set T = {X e A" : <p{X) = 0}, for any given n S N. Therefore, for 
every n > N and for every i large enough, pj((p(Ai..„) = 0) < 1 — d/2 implies 
also £,i{f{Xi,,n) = 0) < 1 — 6/2 which contradicts € Hq. This contradiction 
shows Wp{Hi) = for every p e cliJo- The case p G clHi is analogous. □ 
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